Adaptive Operator Selection in EAs with Extreme - Dynamic Multi-Armed Bandits

نویسندگان

  • Álvaro Fialho
  • Marc Schoenauer
چکیده

The performance of evolutionary algorithms is highly affected by the selection of the variation operators to solve the problem at hand. This paper presents a brief review of the results that have been recently obtained using the “Extreme Dynamic Multi-Armed Bandit” (Ex-DMAB), a technique used to automatically select the operator to be applied between the available ones, while searching for the solution. Experiments on three well-known unimodal artificial problems of the EC community, namely the OneMax, the Long k-Path and the Royal Road, and on a set of a SAT instances, are briefly presented, demonstrating some improvements over both any choice of a single-operator alone, and the naive uniform choice of one operator at each application. 1 Adaptive Operator Selection Adaptive methods use information from the history of evolution to modify parameters while solving the problem. This paper focuses on the Adaptive Operator Selection (AOS), i.e., the definition of an on-line strategy able to autonomously select between different variation operators each time one needs to be applied. Fig. 1 illustrates the general scheme for achieving this goal, from which we can derive the need of defining two main components: the Credit Assignment how to assess the performance of each operator based on the impact of its application on the progress of the search; and the Operator Selection rule how to select between the different operators based on the rewards that they have received so far. Figure 1: General Adaptive Operator Selection scheme. 2 Extreme Dynamic Multi-Armed Bandit The two ingredients of the Adaptive Operator Selection method proposed by us are: an Operator Selection rule based on the Multi-Armed Bandit paradigm, and a Credit Assignment mechanism based on extreme values. 2.1 Operator Selection: Dynamic Multi-Armed Bandits The explored idea, firstly proposed in [3], is that the selection of an operator can be seen as yet another Exploration vs. Exploitation dilemma, but this time at operator-selection level: there is the need of applying as much as possible the operator known to have brought the best results so far, while nevertheless exploring the other possibilities, in case one of the other operators becomes the best option at some point. Such dilemma has been intensively studied in the context of Game Theory, in the so-called MultiArmed Bandit (MAB) framework. Among the existent MAB variants, the Upper Confidence Bound (UCB) [1] was chosen to be used, for being proved optimal w.r.t. maximization of the cumulative reward. More formally, the UCB algorithm works as follows. Each variation operator is viewed as an arm of a MAB problem. Let ni,t denote the number of times the i arm has been played up to time t, and let p̂i,t denote the average empirical reward received until time t by arm i. At each time step t, the algorithm selects the arm maximizing the following quantity:

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dynamic Multi-Armed Bandits and Extreme Value-Based Rewards for Adaptive Operator Selection in Evolutionary Algorithms

The performance of many efficient algorithms critically depends on the tuning of their parameters, which on turn depends on the problem at hand, e.g., the performance of Evolutionary Algorithms critically depends on the judicious setting of the operator rates. The Adaptive Operator Selection (AOS) heuristic that is proposed here rewards each operator based on the extreme value of the fitness im...

متن کامل

Extreme Compass and Dynamic Multi-Armed Bandit for Adaptive Operator Selection

The goal of Adaptive Operator Selection is the on-line control of the choice of variation operators within Evolutionary Algorithms. The control process is based on two main components, the credit assignment, that defines the reward that will be used to evaluate the quality of an operator after it has been applied, and the operator selection mechanism, that selects one operator based on all oper...

متن کامل

Extreme Value Based Adaptive Operator Selection

Credit Assignment is an important ingredient of several proposals that have been made for Adaptive Operator Selection. Instead of the average fitness improvement of newborn offspring, this paper proposes to use some empirical order statistics of those improvements, arguing that rare but highly beneficial jumps matter as much or more than frequent but small improvements. An extreme value based C...

متن کامل

Adaptive Operator Selection for Optimization

Evolutionary Algorithms (EAs) are stochastic optimization algorithms which have already shown their efficiency on many application domains. This is achieved mainly due to the many parameters that can be defined by the user according to the problem at hand. However, the performance of EAs is very sensitive to the setting of these parameters, and there are no general guidelines for an efficient s...

متن کامل

On Adaptive Estimation for Dynamic Bernoulli Bandits

The multi-armed bandit (MAB) problem is a classic example of the exploration-exploitation dilemma. It is concerned with maximising the total rewards for a gambler by sequentially pulling an arm from a multi-armed slot machine where each arm is associated with a reward distribution. In static MABs, the reward distributions do not change over time, while in dynamic MABs, each arm’s reward distrib...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009